Hierarchical Biological Pathway Data Integration and Mining

نویسندگان

  • Shubhalaxmi Kher
  • Jianling Peng
  • Eve Syrkin Wurtele
  • Julie Dickerson
چکیده

information embedded within them are addressed. Today, a bioinformatics information system typically deals with large data sets reaching a total volume of about one terabyte [25]. Such a system serves many purposes;  User can select the data sources and assign confidence to each selected data source  It organizes existing data to facilitate complex queries  It infers relationships based on the stored data and subsequently predicts missing attribute values and incoming information based on multidimensional data.  Data marts (extension of data warehouse) support different query requests. 2. Data management and integration The Pathway Resource List contains over 150 biological pathway databases and is growing [26]. Usually, first step for the user is to identify a subset of these data sources for integration. To consolidate all the knowledge for a particular organism, extract the pathways from each database need to be extracted and transformed into a standard data representation before integration. Representation of the pathway data in each data source poses another challenge as each pathway modality has its own specific representation issues which must be understood before attempting integration across modalities. For example, metabolic pathways, signal transduction pathways, protein-protein interaction, gene regulation etc. Commonly employed styles of data integration may be implemented in different contexts and under requirements, in order to reuse the data across applications for research collaboration. Some of the data integration and management efforts are presented in [27-32]. Several major approaches have been proposed for data integration, which can be roughly classified into five groups [33-34] namely; data warehousing, federated databasing, serviceoriented integration, semantic integration and wiki-based integration. Across all of these groups, to a significant extent, an increasingly important component of data integration is the community effort in developing a variety of biomedical ontologies to deal in a more specific manner with the technicality and globality of descriptors and identifiers of information that has to be shared and integrated across various resources. Variety of approaches for data integration is discussed below.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CMGSDB: integrating heterogeneous Caenorhabditis elegans data sources using compositional data mining

CMGSDB (Database for Computational Modeling of Gene Silencing) is an integration of heterogeneous data sources about Caenorhabditis elegans with capabilities for compositional data mining (CDM) across diverse domains. Besides gene, protein and functional annotations, CMGSDB currently unifies information about 531 RNAi phenotypes obtained from heterogeneous databases using a hierarchical scheme....

متن کامل

An Improved Semantic Schema Matching Approach

Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...

متن کامل

Mining and integration of pathway diagrams from imaging data

MOTIVATION Pathway diagrams from PubMed and World Wide Web (WWW) contain valuable highly curated information difficult to reach without tools specifically designed and customized for the biological semantics and high-content density of the images. There is currently no search engine or tool that can analyze pathway images, extract their pathway components (molecules, genes, proteins, organelles...

متن کامل

PUBLICATION II Data integration and visualization system for enabling conceptual biology

Motivation: Integration of heterogeneous data in life sciences is a growing and recognized challenge. The problem is not only to enable the study of such data within the context of a biological question but also more fundamentally, how to represent the available knowledge and make it accessible for mining. Results: Our integration approach is based on the premise that relationships between biol...

متن کامل

Data integration, pathway analysis and mining for systems biology

Post-genomic molecular biology embodies high-throughput experimental techniques and hence is a data-rich field. The goal of this thesis is to develop bioinformatics methods to utilise publicly available data in order to produce knowledge and to aid mining of newly generated data. As an example of knowledge or hypothesis generation, consider function prediction of biological molecules. Assignmen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012